BayesMD: Flexible Biological Modeling for Motif Discovery

نویسندگان

  • Man-Hung Eric Tang
  • Anders Krogh
  • Ole Winther
چکیده

We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Motif discovery programs

BayesMD [1] is a probabilistic, Bayesian model for predicting novel transcription factor binding sites. Biological information about binding sites properties, background sequence models, occurrence and positional preferences are built into the model in modular fashion. Mixture prior parameters for the motif and background are trained using information on TFBSs and organismspecific promoter sequ...

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

Designing HMMs: Motif discovery and modeling

Position Specific Scoring Matrices capture the distribution of residues observed in each position in a conserved motif, but are not a good model for variable length motifs, recognition of new instances with insertions and deletions, and positional dependencies. Moreover, PSSMs can be used to search for instances of an ungapped motif in an unlabeled sequence, but do not lend themselves to precis...

متن کامل

F3Dock: A Fast, Flexible and Fourier Based Approach to Protein-Protein Docking

Abstract Protein interactions, key to many biological processes, involves induced fit between flexible proteins which typically undergo conformational changes. Modeling this flexible protein-protein docking is an important step in drug discovery, structure determination and understanding structure-function relationships. In this paper, we present F3Dock, a Fast Flexible and Fourier based dockin...

متن کامل

An Evolutionary Model of DNA Substring Distribution

DNA sequence analysis methods, such as motif discovery, gene detection or phylogeny reconstruction, can often provide important input for biological studies. Many of such methods require a background model, representing the expected distribution of short substrings in a given DNA region. Most current techniques for modeling this distribution disregard the evolutionary processes underlying DNA f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 15 10  شماره 

صفحات  -

تاریخ انتشار 2008